Covid 19 Analytics Dashboard¶

In this project we are going through the covid-19 data from the John Hopkins University to build a global analytics dashboard. This project is divided in 4 parts:

  • Setting up the data
  • Generating the views
  • Generating the dashboard
  • Conclusion

Setting up the data¶

We start our project by loading the needed packages and data.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import dash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output
import dash_bootstrap_components as dbc
from datetime import datetime, timedelta
import pycountry

covid_df = pd.read_csv('data/covid_tidy_data.csv')
covid_df.tail()
Out[1]:
Country Date Cases_Confirmed New_Cases_Confirmed Cases_Death New_Cases_Death Cases_Recovered New_Cases_Recovered Code
225166 Zimbabwe 2023-03-05 264127 0 5668 0 0 0 ZWE
225167 Zimbabwe 2023-03-06 264127 0 5668 0 0 0 ZWE
225168 Zimbabwe 2023-03-07 264127 0 5668 0 0 0 ZWE
225169 Zimbabwe 2023-03-08 264276 149 5671 3 0 0 ZWE
225170 Zimbabwe 2023-03-09 264276 0 5671 0 0 0 ZWE
In [2]:
pop_df = pd.read_csv('data/world_population.csv')
pop_df.tail()
Out[2]:
Rank CCA3 Country/Territory Capital Continent 2022 Population 2020 Population 2015 Population 2010 Population 2000 Population 1990 Population 1980 Population 1970 Population Area (km²) Density (per km²) Growth Rate World Population Percentage
229 226 WLF Wallis and Futuna Mata-Utu Oceania 11572 11655 12182 13142 14723 13454 11315 9377 142 81.4930 0.9953 0.00
230 172 ESH Western Sahara El Aaiún Africa 575986 556048 491824 413296 270375 178529 116775 76371 266000 2.1654 1.0184 0.01
231 46 YEM Yemen Sanaa Asia 33696614 32284046 28516545 24743946 18628700 13375121 9204938 6843607 527968 63.8232 1.0217 0.42
232 63 ZMB Zambia Lusaka Africa 20017675 18927715 16248230 13792086 9891136 7686401 5720438 4281671 752612 26.5976 1.0280 0.25
233 74 ZWE Zimbabwe Harare Africa 16320537 15669666 14154937 12839771 11834676 10113893 7049926 5202918 390757 41.7665 1.0204 0.20

The covid_df already contains a tidy dataset and we went to this process in another project that can be seen here. Now let us merge the dataframes. We will only be using the 2020 population, since it was the year of the pandemic start and its peak.

In [3]:
covid_df = covid_df.merge(pop_df[['CCA3', '2020 Population']], left_on='Code', right_on='CCA3')
covid_df.tail()
Out[3]:
Country Date Cases_Confirmed New_Cases_Confirmed Cases_Death New_Cases_Death Cases_Recovered New_Cases_Recovered Code CCA3 2020 Population
222880 Zimbabwe 2023-03-05 264127 0 5668 0 0 0 ZWE ZWE 15669666
222881 Zimbabwe 2023-03-06 264127 0 5668 0 0 0 ZWE ZWE 15669666
222882 Zimbabwe 2023-03-07 264127 0 5668 0 0 0 ZWE ZWE 15669666
222883 Zimbabwe 2023-03-08 264276 149 5671 3 0 0 ZWE ZWE 15669666
222884 Zimbabwe 2023-03-09 264276 0 5671 0 0 0 ZWE ZWE 15669666

We ended up with less rows than before because some countries lack of entries in the countries population file. We can't get the incidence, mortality and lethality rates since those will have to be calculated according to the appied filters in the dashboard. To finish this first part, let us change the Date column type, get rid of the CCA3 column and alter the name of the population column.

In [4]:
covid_df['Date'] = pd.to_datetime(covid_df['Date'])
covid_df = covid_df.drop('CCA3', axis=1)
covid_df = covid_df.rename(columns={'2020 Population':'Population'})
covid_df.tail()
Out[4]:
Country Date Cases_Confirmed New_Cases_Confirmed Cases_Death New_Cases_Death Cases_Recovered New_Cases_Recovered Code Population
222880 Zimbabwe 2023-03-05 264127 0 5668 0 0 0 ZWE 15669666
222881 Zimbabwe 2023-03-06 264127 0 5668 0 0 0 ZWE 15669666
222882 Zimbabwe 2023-03-07 264127 0 5668 0 0 0 ZWE 15669666
222883 Zimbabwe 2023-03-08 264276 149 5671 3 0 0 ZWE 15669666
222884 Zimbabwe 2023-03-09 264276 0 5671 0 0 0 ZWE 15669666
In [5]:
covid_df.to_csv('data/final_dataset.csv', index=False)

Generating the views¶

In this part of the project, we are going to generate the views that will be later used in the dashboard. We are going to write several functions in this part, so they can be reused later. Let us start by getting some KPIs with general information.

In [6]:
def get_cases(df):
    return df['New_Cases_Confirmed'].sum()
def get_deaths(df):
    return df['New_Cases_Death'].sum()
def get_incidence(df):
    return round(get_cases(df)/df.groupby('Country')['Population'].max().sum(), 3)
def get_mortality(df):
    return round(get_deaths(df)/df.groupby('Country')['Population'].max().sum(), 3)
def get_lethality(df):
    return round(get_deaths(df)/get_cases(df), 3)

def get_kpi(indicator, title):
    fig = go.Figure(go.Indicator(
        mode = "number",
        value = indicator,
        number = {'font':{'size':36}, 'font_color':'black'},
        title = {'text': title, 'font_size':16, 'font_color':'black'},
        domain = {'x': [0, 1], 'y': [0, 1]}
    ),layout= go.Layout(height=50))
    return fig
In [7]:
fig = get_kpi(get_cases(covid_df), 'Cases')
fig.show()
In [8]:
fig = get_kpi(get_deaths(covid_df.query("Country == 'Brazil'")), 'Deaths in Brazil')
fig.show()
In [9]:
fig = get_kpi(get_incidence(covid_df.query("Country == 'US'")), 'Incidence')
fig.show()
In [10]:
fig = get_kpi(get_mortality(covid_df.query("Country == 'San Marino'")), 'Mortality')
fig.show()
In [11]:
fig = get_kpi(get_lethality(covid_df.query("Date < '2021-01-01'")), 'Lethality in 2020')
fig.show()

Now let us get bubble map with tha case as the bubble size and the deaths as its color. We are also changing the numbers format in order to make them more readable.

In [12]:
def cases_deaths_by_country(df):
    cases_suffix = ''
    deaths_suffix = ''
    temp_df = df.groupby(['Code', 'Country'])[['New_Cases_Confirmed', 'New_Cases_Death']].sum().reset_index()
    if temp_df['New_Cases_Confirmed'].max()>10**6:
        cases_suffix = 'M'
        temp_df['New_Cases_Confirmed'] = temp_df['New_Cases_Confirmed']/10**6
    elif temp_df['New_Cases_Confirmed'].max()>10**3:
        cases_suffix = 'K'
        temp_df['New_Cases_Confirmed'] = temp_df['New_Cases_Confirmed']/10**3
    if temp_df['New_Cases_Death'].max()>10**6:
        deaths_suffix = 'M'
        temp_df['New_Cases_Death'] = temp_df['New_Cases_Death']/10**6
    elif temp_df['New_Cases_Confirmed'].max()>10**3:
        deaths_suffix = 'K'
        temp_df['New_Cases_Death'] = temp_df['New_Cases_Death']/10**3
    return [temp_df, cases_suffix, deaths_suffix]
    
    

def get_bubble_map(df_suffixex, title):
    df = df_suffixex[0]
    cases_suffix = df_suffixex[1]
    deaths_suffix = df_suffixex[2]
    fig = px.scatter_geo(df, locations="Code", color="New_Cases_Confirmed",
                     hover_name="Country",
                     hover_data={"Code":False,
                                 "New_Cases_Confirmed":":.2f",
                                 "New_Cases_Death":":.2f"},
                     size="New_Cases_Death",
                     color_continuous_scale=px.colors.sequential.YlOrRd,
                     title = title,
                     size_max = 40,
                     labels = {"New_Cases_Confirmed":"Cases ({})".format(cases_suffix),
                               "New_Cases_Death":"Deaths ({})".format(deaths_suffix)},
                     projection="natural earth")
    return fig
In [13]:
get_bubble_map(cases_deaths_by_country(covid_df), "Cases and Deaths Around the World")

Now, to finish this part, let us get some line charts to get the pandemic evolution over time.

In [14]:
def get_cases_day(df):
    return df.groupby('Date')['New_Cases_Confirmed'].sum().reset_index()
def get_deaths_day(df):
    return df.groupby('Date')['New_Cases_Death'].sum().reset_index()

def get_line_chart(df, title, column):
    fig = px.line(df, x='Date', y=column, labels={column:title}, title=title+" by Day")
    return fig
In [15]:
get_line_chart(get_cases_day(covid_df), 'Cases', 'New_Cases_Confirmed')
In [16]:
get_line_chart(get_deaths_day(covid_df.query('Country=="Brazil"')), 'Deaths', 'New_Cases_Death')

Generating the dashboard¶

Our dashboard will have two filters. One range slider selector for the date interval and one dropdown selector for the country. Since the Dash's range slide does not work with dates, we are going to create a dictionary to enable displaying the dates.

In [17]:
date_dict = {}
for i,date in enumerate(covid_df['Date'].unique()):
    date_dict[i]=str(pd.to_datetime(date).date().strftime('%m/%d/%Y'))
list(date_dict.items())[-6:-1]
Out[17]:
[(1137, '03/04/2023'),
 (1138, '03/05/2023'),
 (1139, '03/06/2023'),
 (1140, '03/07/2023'),
 (1141, '03/08/2023')]

Now we are going to create the countries filter options. This filter must include an "All" button, in order to allow us select all the countries.

In [18]:
countries = ['All']
countries.extend(list(covid_df['Country'].unique()))
countries[0:5]
Out[18]:
['All', 'Afghanistan', 'Albania', 'Algeria', 'Andorra']

Now we are ready to generate our dashboard.

In [23]:
app = dash.Dash(external_stylesheets=[dbc.themes.CERULEAN])

app.layout = dbc.Container([
    dbc.Row(dbc.Col(html.H1("Covid-19 Analytics Dashboard"), md=10), align="start", justify="evenly", style={"height": 100}),
    dbc.Row([
        dbc.Col(html.Div("Select the country:"), md=2),
        dbc.Col(html.Div("Select the period:"), md=8)
        
    ], align="start", justify="evenly"),
    dbc.Row([
        dbc.Col(dcc.Dropdown(id='my-dropdown',
                    options=countries,
                     multi=False,
                     clearable=False,
                     value='All',
                    ), md=2),
        dbc.Col(dcc.RangeSlider(min=0,
                   max=1142,
                   value=[0, 1142],
                   marks={0:date_dict[0], 366:date_dict[366], 366+365:date_dict[366+365], 1142:date_dict[1142] },
                    id='my-slider'), md=8)
    ], align="start", justify="evenly"),
    dbc.Row([
        dbc.Col(dcc.Graph(id='cases_kpi', figure={}, responsive=True,  style={"height": "100%"}), md=2, style={"height": "100%"}),
        dbc.Col(dcc.Graph(id='deaths_kpi', figure={}, responsive=True, style={"height": "100%"}), md=2, style={"height": "100%"}),
        dbc.Col(dcc.Graph(id='incidence_kpi', figure={}, responsive=True, style={"height": "100%"}), md=2, style={"height": "100%"}),
        dbc.Col(dcc.Graph(id='mortality_kpi', figure={}, responsive=True, style={"height": "100%"}), md=2, style={"height": "100%"}),
        dbc.Col(dcc.Graph(id='lethality_kpi', figure={}, responsive=True,  style={"height": "100%"}), md=2, style={"height": "100%"})
    ], align="start", justify="evenly", style={"height": 100}),
    dbc.Row([
        dbc.Col(dcc.Graph(id='map', figure={}, responsive=True), md=10)
    ], align="start", justify="evenly"),
    dbc.Row([
        dbc.Col(dcc.Graph(id='cases_line', figure={}, responsive=True), md=10)
    ], align="start", justify="evenly"),
    dbc.Row([
        dbc.Col(dcc.Graph(id='deaths_line', figure={}, responsive=True), md=10)
    ], align="start", justify="evenly")
], fluid=True)

@app.callback(
    Output('cases_kpi', 'figure'),
    Output('deaths_kpi', 'figure'),
    Output('incidence_kpi', 'figure'),
    Output('mortality_kpi', 'figure'),
    Output('lethality_kpi', 'figure'),
    Output('map', 'figure'),
    Output('cases_line', 'figure'),
    Output('deaths_line', 'figure'),
    Input('my-slider', 'value'),
    Input('my-dropdown', 'value')
)
def update_output(period, country):
    dt_ini=pd.to_datetime(date_dict[period[0]])
    dt_fin=pd.to_datetime(date_dict[period[1]])
    query = '(Date>=@dt_ini)&(Date<=@dt_fin)'
    if country != 'All':
        query = query + '&(Country==@country)'
    fig_cases = get_kpi(get_cases(covid_df.query(query)), 'Cases')
    fig_deaths = get_kpi(get_deaths(covid_df.query(query)), 'Deaths')
    fig_incidence = get_kpi(get_incidence(covid_df.query(query)), 'Incidence')
    fig_mortality = get_kpi(get_mortality(covid_df.query(query)), 'Mortality')
    fig_lethality = get_kpi(get_lethality(covid_df.query(query)), 'Lethality')
    fig_map = get_bubble_map(cases_deaths_by_country(covid_df.query(query)), 'Cases and Deaths by Country')
    
    fig_cases_line = get_line_chart(get_cases_day(covid_df.query(query)), 'Cases', 'New_Cases_Confirmed')
    fig_deaths_line = get_line_chart(get_deaths_day(covid_df.query(query)), 'Deaths', 'New_Cases_Death')
    return fig_cases, fig_deaths, fig_incidence, fig_mortality, fig_lethality, fig_map, fig_cases_line, fig_deaths_line


app.run(debug=True, port=8051)

Conclusion¶

In this project, we have created an interactive analytics dashboard with covid-19 around the world data using the Dash package. Even though the package doesn't offer a lot of options and it is not as simple to use as some dataviz tools such as Tableau and Power BI, it pays off for it being free and very flexible.